GENERATION OF LIBRARY OF SOLUBLE RANDOM POLYPEPTIDES LINKED TO mRNA

ABSTRACT

Methods and compositions are provided for producing libraries of soluble random polypeptides. In the methods, the fraction of hydrophilic residues in the polypeptide is controlled so as to maintain the solubility of the polypeptide constructs.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of co-pending U.S. application Ser.No. 14/741,265, filed Jun. 16, 2015, which is a continuation of U.S.application Ser. No. 12/525,437, filed Jul. 31, 2009, now U.S. Pat. No.9,080,256 issued Jul. 14, 2015, which is the U.S. national phase under35 U.S.C. § 371 of PCT International Application No. PCT/US2008/053757which has an International Filing date of Feb. 12, 2008, and whichclaims priority to U.S. Provisional Application No. 60/900,869, filedFeb. 12, 2007. The priority applications are incorporated herein byreference in their entirety.

BACKGROUND OF THE INVENTION Field of the Invention

The present invention relates to compositions and methods for producinglibraries of soluble random polypeptides.

Description of the Related Art

In vitro evolution of proteins is a process in which a startingpopulation of proteins, which may have desirable properties, issubjected to rounds of selection and mutation in order to evolveproteins having improved properties. For example, proteins can beselected for their binding properties to targets such as receptors. Theproteins may be linked to their encoding polynucleotides as in RNAdisplay, ribosome display, phage display etc., and after recovery of asubset of proteins having a desirable property, the polynucleotidesencoding those proteins may be subjected to mutation in order to obtaina population of proteins for use in a further round of selection. Inthis way, proteins having better properties may be quickly obtained byevolution. Systems for accomplishing such in vitro evolution of proteinsare disclosed, for example, in U.S. patent application Ser. No.11/415,844, which is incorporated herein by reference in its entirety.

Often, when the proteins are attached to a large soluble entity, such astheir mRNA, the entity acts as a solubility tag to keep the ensemble insolution. In such cases, when the protein is dissociated from the tagdoes it falls out of solution. Because the evolution step did not use aselection step or steps based on solubility, very little of the resultsmay be usable. Thus, the construction of libraries of soluble proteinconstructs from which to make functional selections has become moreimportant (Eur. J. Biochem. 271, 1595-1608; FEBS 2004). Libraries thatlack a stop codon can be constructed, but they provide proteins that arenot necessarily soluble. In one notable example, Cho et al. constructeda library and selected therefrom an ATP binding protein, bound to itsmRNA. However, when separated from their bound mRNA, the proteins thusselected were highly insoluble. Cho, G., Keefe, A. D., Liu, R., Wilson,D. S., and Szostak, J. W. (2000) J. Mol. Biol. 297, 309-319, which isincorporated herein by reference in its entirety. Only a fraction ofeach clone appeared folded and functional; the proteins themselves tendto aggregate when expressed as free proteins. It has been hypothesizedthat selection of these proteins was likely facilitated by the improvedsolubility imparted by the mRNA—cDNA tail, which indicates suchsequences would not be found in a typical phage-display selection.Takahashi, T. et al., TRENDS in Biochemical Sciences, Vol. 28, No. 3,March 2003, which is incorporated herein by reference in its entirety.The method described by Cho et al. employed a 109 amino acid construct,of which 80 amino acids were random. Cho, G., Keefe, A. D., Liu, R.,Wilson, D. S., and Szostak, J. W. (2000) J. Mol. Biol. 297, 309-319. TheCho et al. method did not involve biasing the codons. The 29 amino acidsat the construct ends were not identified, but unless they were markedlybiased one could not expect this population to be soluble, on average.

It has been suggested that the insolubility of functional clones likelyreflects the relative paucity of proteins that are both folded andfunctional in the vastness of sequence space. Takahashi, T. et al.,TRENDS in Biochemical Sciences, Vol. 28, No. 3, March 2003. Thus, thereis a need for methods for the preparation of soluble proteins, andlibraries of soluble proteins.

SUMMARY OF THE INVENTION

In one embodiment, a method of producing a library of solublepolypeptides linked to polynucleotides encoding the polypeptides isprovided that comprises: synthesizing a plurality of polynucleotides;producing encoded polypeptides from the polynucleotides; and linkingeach polypeptide to the polynucleotide that encodes it; wherein aproportion of hydrophilic or surface compatible amino acid residues ineach polypeptide is selected so that the polypeptide will be soluble.

In a further embodiment, the polypeptides can be approximately 100 aminoacid residues in length. In a further embodiment, the polynucleotidescan be synthesized using uncapped trimer phosphoramidites. In a furtherembodiment, the polynucleotides do not contain stop codons.

In a further embodiment, the hydrophilic or surface compatible aminoacid residues can be selected from the group consisting of Asp, Arg,Ser, Gln, Asn, Lys, Glu, Gly, Pro, and combinations thereof In a furtherembodiment, the hydrophilic or surface compatible amino acid residuescan be selected from the group consisting of Arg, Lys, Asn, His, Pro andAsp and combinations thereof. In a further embodiment, at least oneproline or glycine residue can be included each polypeptide.

In a further embodiment, the proportion of hydrophilic or surfacecompatible amino acid residues in each polypeptide can be selected usingthe following formula:

$c = {\left( \frac{8}{0.41\; N} \right)^{\frac{1}{3}} \pm y}$

where N=the number of amino acids in molecule, c=fraction of amino acidsthat are hydrophilic or surface compatible, and y is selected from thegroup consisting of 0.01, 0.02, 0.03, 0.04 and 0.05.

In a further embodiment, the proportion of hydrophilic or surfacecompatible amino acid residues in each polypeptide can be selected usingthe following formula:

$c \geq {\left( \frac{8}{0.41\; N} \right)^{\frac{1}{3}} - 0.05}$

wherein N=the number of amino acids in each polypeptide, and c=fractionof amino acids that are hydrophilic.

In a further embodiment, the proportion of hydrophilic or surfacecompatible amino acid residues in each polypeptide can be selected usingthe following formula:

$c = {\left( \frac{8}{0.41\; N} \right)^{\frac{1}{3}} \pm 0.05}$

wherein N=the number of amino acids in each polypeptide, and c=fractionof amino acids that are hydrophilic.

In one embodiment, a library of soluble polypeptides linked topolynucleotides encoding the polypeptides is provided that comprises: alibrary of soluble polypeptides linked to polynucleotides encoding thepolypeptides produced by a method that comprises: synthesizing aplurality of polynucleotides; producing encoded polypeptides from thepolynucleotides; and linking each polypeptide to the polynucleotide thatencodes it; wherein a proportion of hydrophilic or surface compatibleamino acid residues in each polypeptide is selected so that thepolypeptide will be soluble.

In a further embodiment, the library can be a phage library. In afurther embodiment, the library of claim can reside within a eukaryoticcell. In a further embodiment, the library can be a ribosome displaylibrary. In a further embodiment, the library can be an RNA displaylibrary. In a further embodiment, the library can be a plasmid displaylibrary. In a further embodiment, the polynucleotides do not containstop codons.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts the distribution of a population of proteins 100 aminoacids in length by hydrophobic amino acid fraction. If the sequences areselected from a population of 50% hydrophilic and 50% hydrophobic aminoacids which is roughly the ratio of the 61 sense codons. From Min-yiShen , Fred P. Davis, Andrej Sali; The optimal size of a globularprotein domain: A simple sphere-packing model; Chemical Physics Letters405 (2005) 224-228).

FIG. 2 depicts a flow chart describing various possible steps that canbe taken during some of the disclosed embodiments.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

The present disclosure relates to compositions and methods to generatelibraries of random polypeptides which are intrinsically soluble. Invarious embodiments, methods for preparing libraries of soluble proteinsare provided. Advantageously, the libraries disclosed herein can be in avariety of different formats, including RNA display, ribosome display,phage display etc. Some embodiments described herein relate topreparation of soluble polypeptides the by controlling the ratio ofhydrophilic to hydrophobic amino acids present in the polypeptides. Someembodiments described herein relate to preparation of solublepolypeptides the by controlling the content of surface compatible aminoacids in the polypeptides. In various embodiments disclosed herein,methods are provided for the selection of a soluble product of in vitroevolution. Various embodiments described herein relate to preparation ofsoluble polypeptides linked to polynucleotides encoding thepolypeptides.

As will be appreciated by one of skill in the art, the ability to makesoluble polypeptides can have great benefit in a wide variety ofapplications, including, for example, increasing accessibility toselection systems. The methods disclosed herein, and libraries generatedby such methods, are useful, for example, in in vitro evolutionprocesses to develop polypeptides that have desirable properties. Suchproperties include, for example, enhanced binding to a target protein,such as a receptor, or the development of polypeptides for use invaccines.

The above and additional embodiments are discussed in mote detail below,after a brief discussion of the definitions some of the terms used inthe specification.

The section headings used herein are for organizational purposes onlyand are not to be construed as limiting the described subject matter inany way. All literature and similar materials cited in this application,including but not limited to, patents, patent applications, articles,books, treatises, and internet web pages are expressly incorporated byreference in their entirety for any purpose. When definitions of termsin incorporated references appear to differ from the definitionsprovided in the present teachings, the definition provided in thepresent teachings shall control. It will be appreciated that there is animplied “about” prior to the temperatures, concentrations, times, etcdiscussed in the present teachings, such that slight and insubstantialdeviations are within the scope of the present teachings herein. In thisapplication, the use of the singular includes the plural unlessspecifically stated otherwise. Also, the use of “comprise”, “comprises”,“comprising”, “contain”, “contains”, “containing”, “include”,“includes”, and “including” are not intended to be limiting. It is to beunderstood that both the foregoing general description and the followingdetailed description are exemplary and explanatory only and are notrestrictive of the invention.

Unless otherwise defined, scientific and technical terms used inconnection with the invention described herein shall have the meaningsthat are commonly understood by those of ordinary skill in the art.Further, unless otherwise required by context, singular terms shallinclude pluralities and plural terms shall include the singular. Asutilized in accordance with the embodiments provided herein, thefollowing terms, unless otherwise indicated, shall be understood to havethe following meanings:

The terms “protein,” “peptide,” and “polypeptide” are defined herein tomean a polymeric molecule of two or more units comprised of amino acidsin any form (e.g., D- or L-amino acids, synthetic or modified aminoacids capable of polymerizing via peptide bonds, etc.), and these termsmay be used interchangeably herein.

The term “linked” as used herein means associated. Linked entities asdescribed herein can be, but need not be, directly physically connectedto each other. In some embodiments, linked entities are directlyphysically attached to each other. One example of a direct physicallinkage is a polypeptide-mRNA fusion. Another example of a directphysical linkage is a ribosome linking an mRNA to the polypeptidesynthesized from the mRNA. In other embodiments, entities can beindirectly linked by association with the same phage, ribosome, or cell,but are not necessarily directly physically attached to each other.Examples of indirect linkages include, without limitation: a phagecontaining a polynucleotide and expressing the polypeptide encoded bythe polynucleotide, and a cell containing a polynucleotide andexpressing the polypeptide encoded by the polynucleotide.

The term “surface compatible” as used herein means capable of existingon a surface of a polypeptide without substantially contributing toinsolubility of the polypeptide. The amino acids Asp, Arg, Ser, Gln,Asn, Lys, Glu, Gly and Pro are considered “surface compatible.”

“Nucleotide” refers to a phosphate ester of a nucleoside, as a monomerunit or within a nucleic acid. “Nucleotide 5′-triphosphate” refers to anucleotide with a triphosphate ester group at the 5′ position, and aresometimes denoted as “NTP”, or “dNTP” and “ddNTP” to particularly pointout the structural features of the ribose sugar. The triphosphate estergroup can include sulfur substitutions for the various oxygens, e.g..alpha.-thio-nucleotide 5′-triphosphates. For a review of nucleic acidchemistry, see: Shabarova, Z. and Bogdanov, A. Advanced OrganicChemistry of Nucleic Acids, VCH, New York, 1994.

The term “nucleic acid” refers to natural nucleic acids, artificialnucleic acids, analogs thereof, or combinations thereof.

As used herein, the terms “polynucleotide” and “oligonucleotide” areused interchangeably and mean single-stranded and double-strandedpolymers of nucleotide monomers (nucleic acids), including, but notlimited to, 2′-deoxyribonucleotides (nucleic acid) and ribonucleotides(RNA) linked by internucleotide phosphodiester bond linkages, e.g. 3′-5′and 2′-5′, inverted linkages, e.g. 3′-3′ and 5′-5′, branched structures,or analog nucleic acids. Polynucleotides have associated counter ions,such as H⁺, NH₄ ⁺, trialkylammonium, Mg²⁺, Na⁺ and the like. Apolynucleotide can be composed entirely of deoxyribonucleotides,entirely of ribonucleotides, or chimeric mixtures thereof.Polynucleotides can be comprised of nucleobase and sugar analogs.Polynucleotides typically range in size from a few monomeric units, e.g.5-40 when they are more commonly frequently referred to in the art asoligonucleotides, to several thousands of monomeric nucleotide units.Unless denoted otherwise, whenever a polynucleotide sequence isrepresented, it will be understood that the nucleotides are in 5′ to 3′order from left to right and that “A” denotes deoxyadenosine, “C”denotes deoxycytidine, “G” denotes deoxyguanosine, and “T” denotesthymidine.

The term “tRNA molecule”, as used herein, shall be given its ordinarymeaning and shall also mean a stable aminoacyl tRNA analog (SATA), aLinking tRNA Analog, and a Nonsense Suppressor Analog, all of which aredescribed herein. A tRNA molecule includes native tRNA, synthetic tRNA,a combination of native and synthetic tRNA, and any modificationsthereof In a preferred embodiment, the tRNA is connected to the nascentpeptide by the ribosomal peptidyl transferase and to the mRNA through anultraviolet induced crosslink between the anticodon of the tRNA moleculeand the codon of the RNA message. This can be done by, for example,thiouracil, bromouracil, and the like. In one preferred embodiment, thelinker is a psoralen crosslink made from a psoralen monoadduct, anon-psoralen crosslinker, or analogs or modifications thereof,pre-placed on either the mRNA's last translatable codon or preferably onthe tRNA anticodon of choice. Preferably, a tRNA stop anticodon isselected. A stop codon/anticodon pair selects for full lengthtranscripts. One skilled in the art will understand that an mRNA nothaving a stop codon may also be used and, further, that any codon ornucleic acid triplet may be used in accordance with several embodimentsof the current invention. A tRNA having an anticodon which is notnaturally occurring can be synthesized according to methods known in theart.

In one embodiment, the anticodon of the tRNA is capable of forming acrosslink to the mRNA, where the cross-link is selected from the groupconsisting of one or more of the following: 2-thiocytosine,2-thiouridine, 4-thiouridine 5-iodocytosine, 5-iodouridine,5-bromouridine and 2-chloroadenosine, aryl azides, and modifications oranalogues thereof. These crosslinkers are available commercially fromAmbion, Inc. (Austin, Tex.), Dharmacon, Inc. (Lafayette, Colo.), andother well-known manufacturers of scientific materials.

The term “pseudo stop codon” is defined herein to mean a codon which,while not naturally a nonsense codon, prevents a message from beingfurther translated. A pseudo stop codon may be created by using a“stable aminoacyl tRNA analog” or SATA, as described below. In thismanner, a pseudo stop codon is a codon which is recognized by and bindsto a SATA. Another method by which to create a pseudo stop codon is tocreate an artificial system in which the necessary tRNA having ananticodon complementary to the pseudocodon is substantially depleted.Accordingly, translation will stop when the absent tRNA is required,e.g., at the pseudo stop codon.

Selection of Amino Acids for Soluble Polypeptides

Many uses for polypeptides require aqueous solubility. Using the methodsdisclosed herein, one can construct libraries that are biased to producesoluble polypeptides.

Solubility of a polypeptide is dependent on, among other proteincharacteristics, the proportion of hydrophobic residues therein and thelength of the polypeptide. Generally, for a polypeptide to be soluble,the polypeptide preferably has an adequate number of hydrophilic aminoacids available to coat the hydrophobic core, regardless of any foldingconstraints. Since the surface to volume ratio of a molecule increaseswith decreasing size, this ratio will differ for different sizes orlengths of sequences.

A simple model shows that for unbiased amino acid content, the optimalsize for solubility is ˜150 amino acids. Shen, M.-Y., Davis, F., Sali,A. Chemical Physics Letters (2005) 405, pp. 224-228, which isincorporated herein by reference in its entirety. According to thissimple model, 2005, at least 90% of polypeptides 109 amino acids longwould be expected to be insoluble. If amino acids are considered ashydrophobic versus hydrophilic, or surface versus core occupiers, thedistribution of different ratios in a population can be seen as abinomial with the percentages of the two types as the probabilities pand 1−p:

$\frac{n!}{{r!}{\left( {n - r} \right)!}}{p^{r}\left( {1 - p} \right)}^{n - r}$

where r is the number of amino acids representing the first classpresent in a sequence and (n−r) representing the second class, withn=the length of the sequence. Thus, for a protein population having 50%hydrophobic and 50% hydrophilic amino acids, the population distributionby hydrophobic amino acid fraction for n=100 would look like that shownin FIG. 1.

One of the primary factors affecting polypeptide solubility is theexposure of hydrophobic amino acids to water solvent. For increasedsolubility, it is generally desirable for polypeptides to have theirhydrophobic amino acids situated away from the water solvent. This is,in a way, a constraint of surface to volume ratio. The smaller thepolypeptide, the greater the surface to volume ratio will be. Becausesurface to volume ratio changes with protein length, the proportion ofhydrophilic to hydrophobic amino acids can be adjusted according to thenumber of total amino acids in the polypeptide to bias the ratio towardssolubility.

For a completely hydrophilic amino acid polypeptide population, therewill be no structure because the molecules will be able to completelyunfold. When this is compared to a population with a very low butnonzero hydrophobic amino acid fraction, the hydrophilic amino acidresidues have fewer ways to distribute themselves and still shield thehydrophobic amino acid residues from the solvent. As the hydrophobicamino acid fraction of the polypeptide increases, the constraints on thepositioning of hydrophilic amino acid residues increase, thus leading toa greater tendency towards structure. This tendency progresses next to“random coils” then to “intrinsically disordered polypeptides” then to“molten globules” and rigid, highly structured globules and finally toinsoluble aggregates. Thus, as the hydrophobic amino acid population ofa polypeptide increases from zero to 100%, the structure of thepolypeptide changes as follows: no structure→random coils→intrinsicallydisordered polypeptides→molten globules→rigid, highly structuredglobules→insoluble aggregates.

Generally, for a given length of polypeptide, there will be ahydrophilic to hydrophobic ratio that is optimal for intrinsicallydisordered polypeptides, a lower hydrophilic to hydrophobic ratio forratio for molten globules yet a lower one, and an even lower hydrophilicto hydrophobic ratio for rigid globules below which there is littlechance of solubility.

In preparing soluble polypeptide libraries, the distribution ofhydrophobic and hydrophilic or surface compatible amino acid residues inthe polypeptides can be selected to bias content towards solubility byselecting the proportion of amino acids that are hydrophilic or surfacecompatible so that the polypeptides will be soluble. The following

formula can be used to select the proportion of amino acids that arehydrophilic or surface compatible so that the polypeptides will besoluble:

$c = \left( \frac{8}{0.41\; N} \right)$

where N=the number of amino acids in the polypeptide and c=fraction ofamino acids that are hydrophilic or surface compatible. This formula isderived as follows:

$N = \frac{8}{0.41\; c^{3}}$ $c^{3} = \frac{8}{0.41\; N}$$c = \left( \frac{8}{0.41\; N} \right)^{\frac{1}{3}}$

where N=the number of amino acids in the polypeptide, and c=fraction ofamino acids that are hydrophilic or surface compatible. In one exampleusing the above formula, for N=100, the proportion of the surfacecompatible amino acids would be 58% and the interior 42%. Using theabove formula as a guideline for selecting the proportion of amino acidsthat are hydrophilic or surface compatible, the adjustment of thedistribution of hydrophobic and hydrophilic, or “surface compatible”amino acid residues in the polypeptides can be performed until a desiredlevel of solubility is achieved.

The amino acids Asp, Arg, Ser, Gln, Asn, Lys, Glu, Gly and Pro areconsidered “surface compatible.” In addition to selecting the proportionof amino acids that are hydrophilic or surface compatible, sharp turnsin the backbone for amino acids will also help with solubility,especially as the polypeptide gets smaller. Thus, in some embodiments,the addition of glycine and proline can also be included in the bias. Insome embodiments involving simple binding polypeptides, a loosepolypeptide configuration with high solubility can often be adequate.Thus, in such situations where loose polypeptide configurations and highsolubility are sufficient, fewer mutation steps are required. In otherembodiments requiring a tighter, more rigid polypeptide structure it canbe useful to go more toward the limit of solubility. In other words, insome embodiments, it can be desirable to adjust the distribution ofhydrophobic and hydrophilic amino acid residues in the polypeptides suchthat the polypeptide is nearly insoluble. This can be done by, forexample, starting closer to the limit of solubility or by continuing tomutate additional times.

It is estimated that each mutation step changes the proportion ofhydrophilic and hydrophobic amino acids in a polypeptide toward the meanof the random codon triplets about 1.5% if the sequences are selectedfrom a population of 50% hydrophilic and 50% hydrophobic amino acids,which is roughly the ratio of the 61 sense codons. This assumes amutation rate of one mutation for each sequence on average. Differentrates would give different changes in the proportion.

In some embodiments, the following formula can be used to select theproportion of amino acids that are hydrophilic or surface compatible sothat the polypeptides will be soluble:

$c = {\left( \frac{8}{0.41\; N} \right)^{\frac{1}{3}} \pm y}$

where N=the number of amino acids in molecule, c=fraction of amino acidsthat are hydrophilic or surface compatible, and y can be about 0, 0.01,0.02, 0.03, 0.04 or 0.05.

In preferred embodiments, the following formula can be used to selectthe proportion of amino acids that are hydrophilic or surface compatibleso that the polypeptides will be soluble:

$c \geq {\left( \frac{8}{0.41\; N} \right)^{\frac{1}{3}} - 0.05}$

where N=the number of amino acids in molecule, and c=fraction of aminoacids that are hydrophilic or surface compatible.

In other preferred embodiments, the following formula can be used toselect the proportion of amino acids that are hydrophilic or surfacecompatible so that the polypeptides will be soluble:

$c = {\left( \frac{8}{0.41\; N} \right)^{\frac{1}{3}} \pm 0.05}$

where N=the number of amino acids in molecule, and c=fraction of aminoacids that are hydrophilic or surface compatible.

When the combined fraction of the surface compatible amino acids is muchgreater than c, the polypeptide population will be less rigid on averageand have fewer constraints on its configuration. As combined fraction ofthe hydrophilic or surface compatible amino acids approaches c, thepolypeptide will become more rigid and have fewer ways to fold.

FIG. 2. illustrates one embodiment for preparing libraries of solublepolypeptides. A plurality of random polypeptides approximately Nresidues in length can be used as a starting point (FIG. 2 at 100). Ncan be any number of amino acid residues, such as, for example withoutlimitation, any number from about 2 to about 10,000 amino acids. In someembodiments, N can be about from about 10 to about 1000 amino acids. Inother embodiments, N can be about from about 50 to about 500 aminoacids. In other embodiments, N can be about from about 80 to about 150amino acids. In other embodiments, N can be about 50, 51, 52, 53, 54,55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72,73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106,107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119 or 120.A proportion c can be calculated based on N which is the number of aminoacid residues in each polypeptide as follows:

$c \geq {\left( \frac{8}{0.41\; N} \right)^{\frac{1}{3}} - 0.05}$

where N=the number of amino acids in each polypeptide (FIG. 2 at 110).The new proportion of hydrophilic or surface compatible amino acids foreach polypeptide is chosen to be c as calculated above for eachpolypeptide. The amino acid content of each polypeptide is adjusted sothat the proportion of hydrophilic or surface compatible amino acids ineach polypeptide is c as calculated above in (110), thereby providing alibrary of soluble polypeptides (FIG. 2 at 120).

Thus, soluble polypeptides can be designed by selecting the fraction ofamino acids that are hydrophilic or surface compatible. Once the aminoacid content of the polypeptides is selected, polynucleotides encodingthe polypeptides can be synthesized using a variety of methods known inthe art. Preferred embodiments for preparation of polynucleotides aredescribed in the next section. Other exemplary methods suitable forpreparing the soluble polypeptides designed as described herein areprovided in, without limitation, Doi, N. et al., Protein Engineering,Design & Selection vol. 18 no. 6 pp. 279-284, 2005; Keefe, A. D. andSzostak, J. W. (2001) Nature, 410, 715-718; Yamauchi, A. et al. (1998)FEBS Lett., 421, 147-151; Prijambada, I. et al. (1996) FEBS Lett., 382,21-25; and Davidson, A. and Sauer, R. (1994) Proc. Natl Acad. Sci. USA,91, 2146-2150, all of which are incorporated herein by reference intheir entireties. In some embodiments, the polynucleotides encodingsoluble polypeptides can then be used, for example, to prepare librariesof soluble polypeptide linked to polynucleotides.

Polynucleotide Preparation

In some embodiments, polynucleotides encoding soluble polypeptides areprepared using nucleotide triplets that correspond to the codons for theamino acids desired. Synthesis of the polynucleotides can be carried outusing a variety of methods known in the art. For example,polynucleotides can be prepared using methods employingphosphoramidites, other chemical methods for constructing 3′ to 5′polynucleotides, or enzymatically using trinucleotides and a suitableligase, such as, for example, T4 DNA or RNA ligase. In some embodiments,phosphoramidite methods can be used to construct short sequences whichcan be enzymatically ligated to the desired length.

In various embodiments, phosphoramidite technology can be used to makefull length polynucleotides. In some embodiments, it can be useful touse a CPG support that has a pore size greater than, for example, about1000 Å which accommodates synthesis of oligos longer than about 150nucleotides. Such CPG supports are commercially available (for example,Millipore and 3-Prime for 3000 Å, Glen Research for 2000 Å). There arevarious ways known in the art to add the trinucleotides together.

In some embodiments, polynucleotides can be prepared using modifiedphosphoramidite systems. For example, polynucleotides can be preparedusing modified phosphoramidites such as codon phosphoramidites, ornucleotide trimers that are fully protected for use in DNA synthesizers(commercially available at, for example, Biolytics). By mixing thetrimers in the desired hydrophobic/hydrophilic ratio, one can simply addfrom a reservoir with this ratio repeatedly during the synthesis of therandom portion of the sequence.

The coupling efficiency of the trimer phosphoramidites is generally only80-90%, compared to the >99% for most phosphoramidite reagents. Cappingwith acetic anhydride can be used to prevent reading frame changes.However, if each sequence that fails to accept an addition is cappedwith acetic anhydride to remove it from further reactions, attrition canseverely limit the length of many reagents. Thus, trimers used withconventional synthesis protocols tend to be very short. However,omitting the capping step without changing the reading frame can providelonger reagents. By replacing the capping step with a step that leavesthe 5′ ends open to further addition, further additions can be possiblewithout any attrition. Because some protocols use capping for otherfunctions such as drying (see, for example, the Expedite 8900Workstation Software User's Guide) and rectification of side chainmodifications (Pon, R. et al. (1986) Nuc. Acids Res. 14, 6453-6470),additional procedures may be desirable when omitting the capping step.

Deletion of the capping step can result in a distribution of uncappedadditional lengths, instead of a single length. Since this will alsofollow a binomial with the added vs. non-added with a probability(assuming 80% coupling efficiency) of 0.8 and 0.2 respectively one wouldneed to run 125 addition cycles to have an average of 100 randomadditions.

In some embodiment, a “mock” capping step with methylimidizole and anaqueous step but no acetic anhydride capping reagent can be used toconstruct a library of random nucleotides (Pon, R. T., Usman, N., Damha,M., and Ogilvie, K. K. (1986) Nuc. Acids Res. 14, 6453-6470; J. ScottEadie and D. Scott Davidson, Nucleic Acids Research 15:8333-8349 1987)An aqueous step is used because the added phosphoramidite can attach tothe O6 of guanine bases which later can lead to strand scission. The“mock” capping step or an aqueous step removes the modification from theguanine and prevents the scission.

In some embodiments, the polynucleotides do not include stop codons inthe intended reading frame.

In various embodiments, the polynucleotides encoding solublepolypeptides can be used in to prepare libraries of soluble polypeptidelinked to polynucleotides. The soluble polypeptides can be linked to thepolynucleotides encoding them by, for example without limitation, RNAdisplay, ribosome display, phage display, etc. A variety of libraryformats can be used to for the soluble peptide libraries.

Library Preparation

The libraries of the present invention can be prepared in a number offormats, including those described below. For each of the libraryformats, polynucleotides encoding the soluble polypeptides can be usedcan be used in preparing the soluble polypeptide libraries. Thepolynucleotides can be synthesized as described in the previous section,or by a variety of methods known in the art.

In some embodiments, libraries of soluble random polypeptides lackingstop codons that are linked to the mRNAs encoding those polypeptides areprovided. In some embodiments, selection can be carried out on thesepolypeptide-mRNA fusions after dissociation of the ribosome. The methodsdisclosed in, for example, U.S. patent application Ser. No. 11/415,844(which is incorporated herein by reference in its entirety) can beemployed to produce polypeptide-mRNA fusions. In some embodiments, forexample, a Linking tRNA Analog can be used to connect the mRNA to itscognate peptide. In some embodiments, the Linking tRNA Analog can be,for example, a native or a synthetic tRNA (or a combination ofnative-synthetic hybrid) that has a crosslinker positioned on theanticodon loop. Preferably, the crosslinker is bound to the anticodonloop through covalent bonding. In some embodiments, the Linking tRNAAnalog accepts the nascent peptide onto its 3′ aminoacyl moiety throughthe action of ribosomal peptidyl transferase. The 3′ aminoacyl moietycan be native to the tRNA or can be synthetically introduced. In someembodiments, the ester bond between the peptide and the tRNA isprotected from ribosomal peptidyl transferase because the message isuntranslatable beyond the codon bound by the tRNA (the linking codon).Thus, the ribosomal peptidyl transferase will be unable to release thepeptide from the tRNA. Therefore, in several embodiments of the presentinvention, the ester bond between the tRNA and a peptide chain is ruggedenough to obviate the need for puromycin. The connection between theLinking tRNA Analog and the peptide, when linked through an ester bond,is protected from dissolution by ribosomal peptidyl transferase bymaking the translated message “untranslatable” beyond the linking codon.Advantageously, the message then will be stably attached to its peptidefor further identification, selection and evolution. Another advantageis that synthetic or modified tRNAs need not be used in some embodimentsemploying the Linking tRNA Analog.

In some particular embodiments, the tRNA is unmodified in the sense thatit is unmodified on the 3′ end, and may or may not have minormodifications on the anticodon loop. In many embodiments, unmodifiednative tRNA (particularly unmodified on the 3′ end) can be used,therefore making the system, among other things, more cost-effective,efficient, quicker, less error-prone, and capable of producing a muchhigher yield. Not wishing to be bound by the following theory, thepresence of puromycin (or similar linkers) can result, in some cases, inlow yield because puromycin obstructs the interaction of the elongationfactor with tRNA thus affecting yield. Further, the elongation factor,when unobstructed by puromycin (or similar linkers) is able toaccomplish dynamic proof-reading, thereby reducing error rates.

In addition to the RNA display library format described above, a numberof cell-based library methods are available for the soluble polypeptidelibraries, such as on the surfaces of phages (Smith, G. P. (1985)Science 228 1315-1317), bacteria (Georgiou, G., et. al. (1993) TIBTECH11 6-10.) and animal viruses (Kasahara, N et al. (1994) Science266,1373-1376). In phage display, proteins or peptides are expressedindividually on the surface of phage as fusions to a coat protein, whilethe same phage particle carries the DNA encoding the protein or peptide.Selection of the phage is achieved through a specific binding reactioninvolving recognition of the protein or peptide, enabling the particularphage to be isolated and cloned and the DNA for the protein or peptideto be recovered and propagated or expressed.

Another suitable library format for the soluble protein librariesdisclosed herein is ribosome display. This format involves the displayof polypeptides in nascent form on the surface of ribosomes, such that astable complex with the encoding mRNA is also formed; the complexes areselected with a ligand for the protein or peptide and the geneticinformation obtained by reverse transcription of the isolated mRNA. Thisis known as ribosome or polysome display. A description of such a methodcan be found in two U.S. patents, granted to G. Kawasaki/Optein Inc.(Kawasaki, G., U.S. Pat. No. 5,643,768 Cell free synthesis and isolationof novel genes and polypeptides (Jul. 1, 1997) and U.S. Pat. No.5,658,754 (Aug. 19, 1997), incorporated herein by reference in theirentireties.)

The various devices and systems described above provide a number of waysto carry out the invention. It is to be understood that not necessarilyall objectives or advantages described can be achieved in accordancewith any particular embodiment described herein. Also, although theinvention has been disclosed in the context of certain embodiments andexamples, it will be understood by those skilled in the art that theinvention extends beyond the specifically disclosed embodiments to otheralternative embodiments and/or uses and obvious modifications andequivalents thereof. Accordingly, the invention is not intended to belimited by the specific disclosures of preferred embodiments herein.

1. (canceled)
 2. A library of soluble polypeptides linked topolynucleotides encoding the polypeptides, comprising: a plurality ofpolynucleotides, each of the plurality of polynucleotides comprising aplurality of codons, wherein a proportion of hydrophilic or surfacecompatible amino acid residues encoded by said codons of said each ofthe plurality of polynucleotides satisfies the following formula:c≥(8/0.41N)̂1/3−0.05 where N=the number of amino acids in said each ofthe plurality of polynucleotides, and c=a fraction of amino acids thatare hydrophilic or surface compatible in said each of the plurality ofpolynucleotides; and an encoded polypeptide linked to each of said eachof the plurality of polynucleotides, wherein an amino acid sequence ofthe encoded polypeptide is encoded by the each of said each of theplurality of polynucleotides that is linked to the encoded polypeptide.